AITopics

Country:

Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Asia > China > Hong Kong (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Neural Information Processing SystemsFeb-11-2026, 10:56:23 GMT

RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance

There is a rapidly growing interest in controlling consistency across multiple generated images using diffusion models.

large language model, machine learning, natural language, (17 more...)

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Neural Information Processing SystemsDec-27-2025, 05:52:10 GMT

Text Alignment Is An Efficient Unified Model for Massive NLP Tasks

Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance.In practice, it is often desirable to build more efficient models---despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes.In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. Given a pair of texts, the model measures the degree of alignment between their information.

efficient unified model, name change, text alignment, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsOct-10-2025, 00:59:23 GMT

AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation

In this work, we analyze the limitations of two primary techniques in text-to-image personalization: Textual Inversion and DreamBooth.

alignment, arxiv preprint arxiv, attention map, (13 more...)

Country:

Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Asia > China > Hong Kong (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-9-2025, 23:45:07 GMT

3b057de5a2e38bd8fa10201866c20dbf-Paper-Conference.pdf

arxiv preprint arxiv, consistency, diffusion model, (13 more...)

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.46)
Leisure & Entertainment (0.46)
Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Mohamed, Mohamed, Nichyporuk, Brennan, Arnold, Douglas L., Arbel, Tal

Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance

arXiv.org Artificial IntelligenceOct-2-2025

Vision-language models have demonstrated impressive capabilities in generating 2D images under various conditions; however the impressive performance of these models in 2D is largely enabled by extensive, readily available pretrained foundation models. Critically, comparable pretrained foundation models do not exist for 3D, significantly limiting progress in this domain. As a result, the potential of vision-language models to produce high-resolution 3D counterfactual medical images conditioned solely on natural language descriptions remains completely unexplored. Addressing this gap would enable powerful clinical and research applications, such as personalized counterfactual explanations, simulation of disease progression scenarios, and enhanced medical training by visualizing hypothetical medical conditions in realistic detail. Our work takes a meaningful step toward addressing this challenge by introducing a framework capable of generating high-resolution 3D counterfactual medical images of synthesized patients guided by free-form language prompts. We adapt state-of-the-art 3D diffusion models with enhancements from Simple Diffusion and incorporate augmented conditioning to improve text alignment and image quality. To our knowledge, this represents the first demonstration of a language-guided native-3D diffusion model applied specifically to neurological imaging data, where faithful three-dimensional modeling is essential to represent the brain's three-dimensional structure. Through results on two distinct neurological MRI datasets, our framework successfully simulates varying counterfactual lesion loads in Multiple Sclerosis (MS), and cognitive states in Alzheimer's disease, generating high-quality images while preserving subject fidelity in synthetically generated medical images. Our results lay the groundwork for prompt-driven disease progression analysis within 3D medical imaging.

diffusion model, machine learning, natural language, (14 more...)

2509.05978

Country: North America > Canada > Quebec (0.15)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.89)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

arXiv.org Artificial IntelligenceOct-1-2025

Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

Shin, Chaehun, Choi, Jooyoung, Barthelemy, Johan, Lee, Jungbeom, Yoon, Sungroh

We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Existing supervised fine-tuning methods, which rely only on positive targets and use the diffusion loss as in the pre-training stage, often fail to capture fine-grained subject details. To address this, SFO introduces additional synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically produces synthetic negatives tailored for subject-driven generation by introducing controlled degradations that emphasize subject fidelity and text alignment without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus fine-tuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms recent strong baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/

large language model, machine learning, natural language, (17 more...)

2506.03621

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceAug-5-2025

What's in the News? Towards Identification of Bias by Commission, Omission, and Source Selection (COSS)

Zhukova, Anastasia, Ruas, Terry, Hamborg, Felix, Donnay, Karsten, Gipp, Bela

In a world overwhelmed with news, determining which information comes from reliable sources or how neutral is the reported information in the news articles poses a challenge to news readers. In this paper, we propose a methodology for automatically identifying bias by commission, omission, and source selection (COSS) as a joint three-fold objective, as opposed to the previous work separately addressing these types of bias. In a pipeline concept, we describe the goals and tasks of its steps toward bias identification and provide an example of a visualization that leverages the extracted features and patterns of text reuse.

artificial intelligence, information, natural language, (16 more...)

doi: 10.1109/JCDL57899.2023.00050

2508.0254

Country:

Europe > Germany (0.30)
North America > United States > New Mexico (0.15)

Genre: Research Report (0.40)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.31)

arXiv.org Artificial IntelligenceJul-10-2025

XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Gong, Yitian, Jin, Luozhijie, Deng, Ruifan, Zhang, Dong, Zhang, Xin, Cheng, Qinyuan, Fei, Zhaoye, Li, Shimin, Qiu, Xipeng

Speech codecs serve as bridges between speech signals and large language models. An ideal codec for speech language models should not only preserve acoustic information but also capture rich semantic information. However, existing speech codecs struggle to balance high-quality audio reconstruction with ease of modeling by language models. In this study, we analyze the limitations of previous codecs in balancing semantic richness and acoustic fidelity. We propose XY-Tokenizer, a novel codec that mitigates the conflict between semantic and acoustic capabilities through multi-stage, multi-task learning. Experimental results demonstrate that XY-Tokenizer achieves performance in both semantic and acoustic tasks comparable to that of state-of-the-art codecs operating at similar bitrates, even though those existing codecs typically excel in only one aspect. Specifically, XY-Tokenizer achieves strong text alignment, surpassing distillation-based semantic modeling methods such as SpeechTokenizer and Mimi, while maintaining a speaker similarity score of 0.83 between reconstructed and original audio. The reconstruction performance of XY-Tokenizer is comparable to that of BigCodec, the current state-of-the-art among acoustic-only codecs, which achieves a speaker similarity score of 0.84 at a similar bitrate. Code and models are available at https://github.com/gyt1145028706/XY-Tokenizer.

large language model, machine learning, natural language, (18 more...)

2506.23325

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-4-2025

APT: Adaptive Personalized Training for Diffusion Models with Limited Data

Chae, JungWoo, Kim, Jiyoon, Choi, JaeWoong, Kim, Kyungyul, Hwang, Sangheum

Personalizing diffusion models using limited data presents significant challenges, including overfitting, loss of prior knowledge, and degradation of text alignment. Overfitting leads to shifts in the noise prediction distribution, disrupting the denoising trajectory and causing the model to lose semantic coherence. In this paper, we propose Adaptive Personalized Training (APT), a novel framework that mitigates overfitting by employing adaptive training strategies and regularizing the model's internal representations during fine-tuning. APT consists of three key components: (1) Adaptive Training Adjustment, which introduces an overfitting indicator to detect the degree of overfitting at each time step bin and applies adaptive data augmentation and adaptive loss weighting based on this indicator; (2)Representation Stabilization, which regularizes the mean and variance of intermediate feature maps to prevent excessive shifts in noise prediction; and (3) Attention Alignment for Prior Knowledge Preservation, which aligns the cross-attention maps of the fine-tuned model with those of the pretrained model to maintain prior knowledge and semantic coherence. Through extensive experiments, we demonstrate that APT effectively mitigates overfitting, preserves prior knowledge, and outperforms existing methods in generating high-quality, diverse images with limited reference data.

artificial intelligence, knowledge, machine learning, (15 more...)

2507.02687

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)